Limit points of the iterative scaling procedure 

Erik Aas 



Abstract 

The iterative scaling procedure (ISP) is an algorithm which computes 
a sequence of matrices, starting from some given matrix. The objective is 
to find a matrix 'proportional' to the given matrix, having given row and 
column sums. In many cases, for example if the initial matrix is strictly 
positive, the sequence is convergent. In the general case, it is known 
that the sequence has at most two limit points. When these are distinct, 
convergence can be slow. We give an efficient algorithm which finds these 
limit points, invoking the ISP only on instances for which the procedure 
is convergent. 

Introduction 

The iterative scaling procedure (ISP) is an algorithm which, given an 
m x n entry wise nonnegative matrix A and positive numbers r%, . . . ,r m , 
ci, . . . , c n attempts to find a matrix diagonally equivalent to A, having 
row sums ri and column sums Cj. Two matrices A and A' are diagonally 
equivalent if there are strictly positive numbers xi,..., x m , yi,. . . ,y n such 
that Oij = Xia^yj for all 

The ISP has been applied in a variety of contexts, the most interesting 
of which perhaps being the ranking of webpages $ . A discrete version 
of the algorithm is used by the Zurich City Council to distribute seats in 
parliamentary elections [5]. 

We proceed by defining the ISP. Throughout, A will denote a fixed 
nonnegative m x n matrix, and ri, . . . , r m , c\, . . . , c n fixed positive num- 
bers. We further assume that there is no row or column in A containing 
only zeros. All matrices considered will be nonnegative entry by entry and 
denoted by capital letters. Matrix entries will be denoted by the corre- 
sponding lower case letters. For example, the entry in A is denoted 
aij. By the row adjustment (to n, . . . , r m ) 1Z(A) of A we mean the matrix 
whose (i,j) entry is entries XiOij, where Xi — ^ r ' a , and we define the 

column adjustment C(A) (to ci, . . . , c n ) similarly. The numbers Xi will be 
referred to as row multipliers. Note that 1Z(A) = A = C(A) in case A has 
both the desired row and column sums. 

The iterative scaling procedure consists of adjusting rows and columns 
alternatingly, starting with A. The iterates under the scaling procedure 
are defined to be B(k) := TZ(C(k - 1)), C{k) = C(B(k)) for k > 1 and 

B{i) = n{A). 
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It is known that the sequences B(k) and C(k) are convergent [I], and 
that if there is any matrix D with both the desired row sums and column 
sums, with the property that a,ij = =>■ dij = 0, then those two limits are 
equal (pQ, see also [4]). If there is no such matrix D, the limits can clearly 
not be equal. In fact, if the limits are not equal, but ri — Y2j c j 5 then 
the support of the limit points are not equal to the support of the initial 
matrix, that is, some entries in the matrix tend to zero during the ISP. 
The objective of this note is to describe an algorithm which finds these 
entries efficiently (in time less than quadratic in input size - using the ISP 
itself to find these entries can require arbitrarily many steps, though not 
on realistic instances). Proving that the algorithm works is non-trivial, 
and gives some insight into the structure of the limit points. 

We will use the following observation in what follows. If we scale 
the desired row sums by a common factor t, then this gives new ISP 
sequence B'(k),C'(k) closely related to the original sequence; we have 
B'(k) = tB(k) and C'{k) = C(k). 

Acknowledgements 

I am very thankful to Fabian Reffel who read an early version of this 
note, finding a serious error and providing many helpful suggestions. I 
would also like to express my gratitude to Kai-Friedrike Oelbsmann and 
the Augsburg group headed by Prof. Friedrich Pukelsheim for telling me 
about the present problem and electoral methods in general. 



An example 



We choose the following initial data, writing the desired row and column 
sums on the borders of the matrix A. 
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After 1000 iterations we get the following matrices: 



B(500) 



C(500) = 
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Now we give some further examples which will be useful when reading 
the next sections, using notation and terminology introduced there. 
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We have = (7i,Ji) where h = {1,2}, Ji = {1,2}. One can 

check that we have <fi(A, r, c) = ^(B) in this case, as expected. 

In step I of the algorithm, we find the splitting consisting of the blocks 
(Ji, Ji) and (7 2 , J 2 ) where 7 2 = {3, 4} and J 2 = {3, 4}. 

In step II applied to the block (7i, Ji) we find the subset 7a = {1} C 7i 
with the property r(7 3 ) = 6 = ^4 = ^jc(iV(7 3 ) n Ji). So the result of 
applying step II to (7i, Ji) is (7 3 , J 3 ) and (7 4 , J4) := (7i\7 3 , Ji\J 3 ). Ap- 
plying step II to any of the blocks (7 2 , J 2 ), (7 3 , J 3 ) or (74, J4) yields noth- 
ing new, so the final splitting found is {(7 2 , J 2 ), (7 3 , J 3 ), (74, J4)}, which 
coincides with the decomposition of B. 

Limit points 

Let (for the remainder of this note) B, C denote the limits of the sequences 
B(k), C{k). 

By a splitting S we mean a set of pairs (7, J) of sets of row respectively 
column indices, with the property that the sets {7 : 3 J : (7, J) £ S} form 
a partition of some underlying set of rows (which will always be the set of 
rows of A below) and a similar statement holds for the sets J. We call any 
pair 7, J of rows respectively columns a block. An elementary refinement 
of a splitting consists of replacing a pair (7, J) with (7i, Ji) and (7 2 , J 2 ) 
such that 7i,7 2 partition 7 and Ji, J 2 partition J. If the splitting S' 
is obtained by performing a sequence of elementary refinements on the 
splitting S then we say that 5" is a refinement of S. 

By the decomposition of a matrix B we will mean the splitting 
7i, . . . , 7 r , Ji, . . . , J r of the row and column sets of B such that bij =fc =>■ 
i £ Ik and j G Jk for some fe, minimal with respect to refinement. 

In this section we will describe the decomposition of B. Since the 
decomposition of B and of C coincide, we will only mention the decom- 
position of B in what follows. 

Let S(A) := ■ a,ij 7^ 0} denote the support of A. By definition, 

S(A) = S(TZ(A)) = S(C(A)). Clearly S(B) = S(C) C S(A),C{B) = C, 
and 11(C) = B. 

For subsets of rows 7, define r(7) := 5Z ieJ n and c(J) similarly for 
subsets J of columns. 

Let Si and be such that Xibij = dj and CijUj = bij. Hence 
bij and thus Xiyj = 1 whenever (i, j) € 5(73). Let 7i, . . . , 7 r , Ji, . . . , J r be 
the decomposition of 73. Then a?j is constant for i £ Ik for each k and i/j is 
constant in Jk for each fc, and if i € Ik, j € Jk we have Xi = 1/t/j = ^j^j- 

First, note that for each k, the submatrix 73[7fc, Jk] is a matrix with row 
sums n and column sums ^j^Cj. Similarly, C[Ik, Jk] has column sums Cj 
and row sums j^j r i- Whenever k / Z, we have 73[7 fe , J ; ] = C[7 fc , J ; ] = 0. 

We will say a block (7, J) is feasible if there is some 7 x J matrix M 
with row sums n, column sums ^jj c j and S(M) C S(A). A splitting is 
feasible if all its blocks are feasible. So the decomposition of B is clearly 
feasible. 

The quotient of a block (7, J) is the number r(I)/c(J). 
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The first block 



We now give some definitions and lemmas needed for describing the algo- 
rithm. 

Lemma 1. Let pi, . . . , p n , q%, . . . , q n be positive real numbers. Then 
mm i ^ < < max, £f . If any of the two inequalities is in fact an 

equality, then all the Pi/qt are equal. 

Proof. This follows by induction, the case n — 2 being easy. □ 

It is important to note that mhiiPi/ maXjPj < (pi + ••• + p n )/(qi + 
■ ■ • + ?n) is a considerably weaker statement than Lemma [1] 

We will use the following simple generalization of a well-known theo- 
rem by Philip Hall (see eg. [2]). 

Lemma 2. Suppose ri + • • • + r m = t(ci + ■ ■ ■ + c n ) where t is some 
positive real number. There is a matrix B with S(B) C S(A) and row 
sums Ti and column sums tCj if and only if there is no subset 7 of rows 
such that r(7) > tc(N(I)), where N(I) = {j £ [n] : 3i : ay / 0}. 

For initial data A, r, c, we define a subset <p(A, r, c) of rows. In fact 
<p(A, r, c) will depend only on the support of A, and not on the non-zero 
values themselves. 

We define <p(A, r, c) as the subset I of rows such that r(I)/c(N(I)) 
is maximal and #7 is maximal among those 7 maximizing r(I)/c(N(I)). 
Let us prove that (^(^4, r , c) is well defined. 

Lemma 3. If 7i and 72 satisfy the definition of <p(A, r, c), then 7i = 72. 

Proof. We have r(7i) + r(7 2 ) = r(7i U 7 2 ) + r(7i n 7 2 ) and c(N(h)) + 
c(N(I 2 )) = c(N(h) UN(I 2 ))+c(N(h)nN(I 2 )) > c(N(hUl 2 )) + c(N(hn 
h)). Therefore 

r(Ii) + r(I 2 ) r(h U h) + r(h n 7 2 ) 

c(JV(/i)) + c(JV(/ 2 )) " c(JV(Ji U 7 a )) + c(iV(7r n 7 2 )) ' 
By Lemma [1] either 7i U 72 or 7i PI 72 shows that neither 7i nor 72 can 
satisfy the definition of tp(A, r,c). □ 

Using linear programming, it is easy to compute <p(A,r,c). 
Though it will follow from proposition [T] it is interesting to note that 
we can prove directly that (7, N(I)) is feasible, where 7 = ip(A, r, c). 

Lemma 4. Let 7 = ip(A,r,c). Then the block (7, JV(7)) is feasible. 

Proof. Suppose (7,iV(7)) is not feasible. By Lemma[2]there is then some 
7' C 7 such that r(7') > _^ C (JV(/')), or > But this 

contradicts the choice of 7. □ 

Consider the decomposition T>' of 7? and denote by ^{B) the block 
obtained by merging all blocks with maximal quotient into a single block 
(which will have this same quotient). We will denote by T> the decompo- 
sition obtained from T>' after this merge. Of course, the blocks in T> are 
all feasible (since this is true for T>'). 
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Proposition 1. Let 7 = ip(A,r,c). We have (I,N(I)) = *(73). 



Proof. Let (h,Ji) = *(73). We wish to prove that h = I and Ji = 7V(J). 
We do this in five steps. 

. Ji = N(h). 

Suppose this is not the case. Then there are p 6 Ji, q £ Ji such 
that a pg 7^ 0. Denote by (72, J2) the block in the decomposition of 
B such that q G J2. 

Let j/j(fc) be the column multipliers used when computing 
C(B(k)) = C(k) and Xi(k) the row multipliers used when computing 
n{C{B{k))) = B(k + 1). We know that x p (k) -> and y q (k) -> 

as k -»■ co. Therefore z p (fc)y,(fc) -> ^ > 1. Choose 
77 > and if such that x p (k)y q (k) > 1 + 77 for all k > K. Hence 
b pq (K + n) > (1 + rj) n b pq (K) -} 00 as n -> 00. This contradicts 
the fact that all entries in the B(k) are bounded by max(ri + ■ ■ ■ + 
r m , ci H + c»). 

. r(/)/c(/V(/)) = r(Ji)/c(Ji) and 
• 7 C h. 

The proofs of these two statements are similar so we do them si- 
multaneously. It follows from the previous step that r(I)/c(N(I)) > 
r(h)MJi). 
Note that 

r(i) _ Ec/voK^nJ') E ( ^, JO r(JnJ') 

— „( \T(T r> Tl\ n TA V ' 



<W)) E ( rvo c(/v(7) n J') - E ( r,j n /') n J') 

where the sums range over all (/', J') G V with 7 n 7' 7^ 0. 
Since 7i, Ji has maximal quotient in T>, for any (7', J') G 7?, r(7 n 
I')/c(N(I n 7') n J') <r(7i)/c(Ji) <r(I)/c(N(I)). By the formula 
above and Lemma [Q all the numbers r(7 n I')/c(N(I n 7') n J') 
therefore have to be equal, and this common value is r(I)/c(N(I)). 
Now, suppose r(7i)/c(7V(7i)) < r(7)/c(7V(7)). Take any term (7', 
J') in ©. We then have c(N ^ nJI) = ^ > ^ > rg. 
But by LemmaU this contradicts (7', J') being feasible. 
So r(7i)/c(iV(7i)) = r(I)/c(N(I)). Using a similar argument as in 
the previous paragraph, if we have any 7' 7^ 7 occurring in (fl) , then 
this will contradict (7', J') being feasible. Thus the only term in (JTJ) 
is c^un/onj!) ' and consequently 7 C 7i. 
7 = /i 

This follows directly from the previous two steps and lemma [3] 
N(I) = ,h 

We have Ji = 7V(7i) = N(I). 

□ 
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The algorithm 



The algorithm consists of two steps, I and II. We first describe step I. 

The output of step I is the splitting given by the blocks (Ii, Ji), . . . , 
(I r ,Jr) where (h,Ji) = tp(A,r,c), (h,J2) = f(A[I', J'], r\ V , c\ ji) with 
I' = [m] — I, J' = [n] — J and so on. Here we think of r and c as 
functions, and r^i et.c. denotes restriction. 

It follows from Lemma [T] that the decomposition of B is a refinement 
of the splitting S obtained in step I. 

Step II consists of finding this refinement. This is implicit already in 
the work of Pretzel [4], but we give a description here for completeness. 
It is sufficient to describe step II for a single block of S, which we assume 
to be all of A for ease of notation. Using linear programming we can 
determine whether there is a proper subset / C [m] such that r(I) = 
^^yc(jV(/)). In case there are none, the output is just the original block. 
In case there is such a block I, we output (/, N(I)) and ([m]—I, [n] — N(I)) 
and apply step II recursively to these two blocks. 

To show that this indeed generates T>, it is sufficient to prove the fol- 
lowing lemma, where we have assumed r([m]) = c([n]) for ease of notation. 

Lemma 5. If the only I satisfying r(7) > c(N(I)) are 7 = and [m], 
then S{A) = S(B). 

Proof. Suppose (p,q) € S(B)\S(A). By theorem 1 in [4], the support of 
B is the largest possible among all matrices B' with row sums r» column 
sums Cj and satisfying S(B') C S(A). 

Therefore, for any e > 0, there is no matrix with support a subset of 
S(A), row sums r[ and column sums c'j, where r[ = r% for i 7^ p, r[ = n — e 
and c'j = Cj for j 7^ q and c' q = c q — e. By lemma [2] this means there is 
some proper subset I' = /'(e) of rows such that r'(I') > c'(N(I')). Letting 
e — > (and observing that the number of possible subsets /' is finite) this 
gives us a proper subset /' with r(I') = c(N(I')), a contradiction. □ 

Now, again referring to [4], if change A by setting the entries in 
S(A)\S(B) to 0, the ISP limit points will not change. However numerical 
experiments suggest that convergence is much quicker than without the 
change. 

As a small example of this, take the matrix from the example above, 
set the entries outside the splitting found ((1,3), (1,4), (2, 1), (2,3), and 
(2,4)) to zero. Then it takes about 3 iterations to get as close to (B,C) 
in the example, as _B(500) and C(500) from earlier are from (B,C). 

Future work 

I would like to mention some related extensions and problems. 

First, most results above seem to carry over to the much more general 
setting of Theorem 5.2 in Q], but I have not explored this further. 

Second, there is a natural version of the ISP with continuous time, as 
follows. Let 1? £ [0,1], and define 1Z#(A)ij = xfAij, and C#(A)y = 
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Aijfjj . Now we can define F a ,p = R a Cp, and ask about the prop- 
erties of lim e ^o lim n _nx) F^g e (A). The matrix F a> p(A) wiil be diago- 
naliy equivaient to A and from this it follows (cf. [J]) that the limit 
lim e ^o lim n _ ! . 00 F^ Be (A) will be the same as the ordinary ISP limit of A 
if the latter exists. It is not clear what happens in the general case when 
that limit does not exist. 

Finally, the ISP can of course be defined for arbitrary, not necessarily 
nonnegative, A, r, c. Two issues arise in this case. One problem is that 
we may obtain matrices having marginals equal to during the iteration. 
This could probably be avoided by using the continuous version described 
above; it seems reasonable such a system would repel from matrices hav- 
ing some marginal close to 0. Also, it will not be possible to prove the 
analogous statements about the limit points, since they are not true. For 
example, applying ISP (with discrete time) to the initial data 
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gives a sequence with period 4. This cycle appears to be unstable. 
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